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(54) Document image assessment system and method 



(57) A system and method in accordance with the 
present invention includes a scanning assembly and a 
storage device coupled to a programmed computer with 
a set of instructions for carrying out an assessment of a 
document image. The system and method operate by: 
processing the document image to obtain one or more 
attributes related to the geometrical integrity of the doc- 
ument image; selecting a threshold value from a data- 
base lor each of the obtained attributes; and then com- 
paring each of the obtained attributes against the 
threshold value selected for the obtained attribute to de- 
termine a difference for each and then evaluating one 
or more of the differences using predetermined criteria 
to provide evaluation results of the geometrical integrity 
of the document image. The system and method may 
also operate to: process the document image to obtain 
attributes related to line skew, average character confi- 
dence, expected contrast, and sharpness in the docu- 
ment image; select a threshold value from a database 
for each of the obtained attributes; and compare each 
of the obtained attributes against the threshold value se- 
lected for the obtained attribute to determine the differ- 
ence for each and then evaluate one or more of the dif- 
ferences using predetermined criteria to provide evalu- 
ation results of the condition of the text of the document 
image and of the condition of the image with respect to 
a fixed reference. 
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of rectangularity; 

FIG. 2(b) is a perspective view of a. frame of infor- 
mation with a document image; 
FIG. 3(a) is a flow chart illustrating the process for 
obtaining, comparing, and evaluating the attribute s 
of linearity; 

FIG. 3(b) is perspective view of a frame of informa- 
tion with a document image; 
FIG. 3(c) is a perspective view of another frame of 
information with another document image; 10 
FIG. 4(a) is a flow chart illustrating the process for 
obtaining, comparing, and evaluating the attribute 
of corner location; 

FIG. 4(b) is a perspective view of a frame of infor- 
mation with a document image; is 
FIG. 5 is a flow chart illustrating the process for ob- 
taining, comparing, and evaluating the attribute of 
expected fields; 

FIG. 6(a) is a flow chart illustrating the process for 
obtaining, comparing, and evaluating the attribute 20 
of sheet skew detection; 

FIG. 6(b) is a perspective view of a frame of infor- 
mation with a document image; 
FIG. 7(a) is a flow chart illustrating the process for 
obtaining, comparing, and evaluating the attribute 25 
of line skew; 

FIG. 7(b) is a perspective view of an initial position 
of four line-finding probes with respect to a docu- 
ment image in a frame of information; 
FIG. 7(c) is a perspective view of an intermediate 30 
positions of four line-finding probes with respect to 
the document image of FIG. 7(b); 
FIG. 7(d) is a perspective view of a final position of 
four line-finding probes with respect to the docu- 
ment image of FIG. 7(c); and 35 
FIG. 8 is a flow chart illustrating the process for ob- 
taining, comparing, the attribute of average charac- 
ter confidence. 

FIG. 9 is a flow chart illustrating the process for ob- 
taining and comparing the attribute of expected 
contrast; 

FIG. 10 is a flow chart illustrating the process for 
obtaining and comparing the attribute of sharpness; 

DETAILED DESCRIPTION 

A document image assessment system 20 and 
method in accordance with one embodiment of the 
present invention are illustrated in FIGS. 1(a) and 1(b) 
respectively. System 20 includes a scanner assembly so 
or high speed imaging device 22 and a storage device . 
24 which are all coupled to a programmed computer 28. 
System 20 and method operates by processing a doc- 
ument image to obtain one or more attributes related to 
the document image (Step 30), selecting a threshold ss 
value from a database for each of the obtained attributes 
(Step 32), and then comparing each of the obtained at- 
tributes against the threshold value selected for the ob- 
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tained attribute to determine a difference between them 
for each and then using predetermined criteria to eval- 
uate one or more of the differences to provide evaluation 
results of the geometrical integrity of document image. 
System 20 and method can also be adapted to process 
a document image to obtain attributes related to the con- 
dition of text in the document image and/or the condition 
of the document image with respect to a frame of refer- 
ence. With system 20 and method, the throughput of 
scanner assembly 22 can be maintained while still ob- 
taining evaluation results of the condition of each docu- 
ment image. 

Referring more specifically to FIG. 1(a), system 20 
includes scanner assembly 22 which has a document 
feeder 23, a scanner 25, and. a document transport 
mechanism 27. Documents are loaded into document 
feeder 23 which places each document on document 
transport mechanism 27 which has a known uniform 
background. Scanner 25 scans each document against 
the background of document transport mechanism 27 
and captures a document image of the document in a 
frame of information which is larger than the size of the 
document. Each frame of information generated by 
scanner 25 is represented by a number of rows and col- 
umns of pixel data. Each pixel in the pixel data has a 
grey scale value between 0 and 255 represented in an- 
alog form. The background of document transport 
mechanism 27 will have a pixel value which will be 
known by programmed computer 28. As a result, the 
computer 28 will be able to use the known pixel value 
of the background to distinguish the background from 
the document image in the frame of information. Once 
the scanner 25 has obtained the analog pixel data, then 
scanner 25 will convert the analog pixel data to digital 
pixel data with an analogto-digital (WD") converter (not 
shown) and then will output the digital pixel data in serial 
or parallel form to computer 28. A scanner assembly 22, 
such as the Imagelink 9XXX Series manufactured by 
Eastman Kodak Company, could be used. 

System 20 also includes storage device 24. Storage 
device 24 is a memory device, such as a 68000 ADL, 
which stores the digital pixel data which represent the 
frames of information, the attributes for each document 
image, and the evaluation results of document images. 
Although not shown, storage device 24 could be incor- 
porated within programmed computer 28. 

Programmed computer 27 includes a central 
processing unit ("CPU-) 29, a random access memory 
("RAM") 31 , a read only memory ("ROM") 33, input/out- 
put devices ("I/O') 35, 37, and 39, a display 38, and a 
keyboard 26 which are all coupled to a bus 41. I/O 35 is 
coupled to scanner 25 and receives the frames of infor- 
mation from scanner 25, I/O 37 is coupled to storage 
device 24 and outputs and can retrieve frames of infor- 
mation, attributes, and evaluation results, and I/O 39 is 
coupled to display 38 and keyboard 26 which can re- 
ceive and output information on the assessment, thresh- 
old values, and criteria to evaluate differences. The set 
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termined. Once angles 6,, 0 2 , 6 3 , and 0 4 are obtained, 
the difference between various combinations of angles 
are calculated to provide attributes of rectangularity 
(Step 72). In this particular embodiment, 0, minus 8 3 , 02 
minus 0 4 , 0 2 minus 0 V 0 2 minus 9 3 , 0 1 minus 0 4 , and 0 4 5 
minus 8 3 are the attributes of rectangularity. 

Once the attributes of rectangularity are obtained 
(Step 30), then the threshold value set for each attribute 
by the operator are selected (Step 32). Once the thresh- 
old value is selected, the difference between each at- 10 
tribute of rectangularity and the threshold value is de- 
termined (Step 34). In this particular embodiment, the 
threshold value for the difference between angles for op- 
posing edges of document image 74 should be zero and 
the difference in orientation between angles for adjacent 
edges of document image 74 should be 90°. Specifical- 
ly, the threshold values for fi n minus 0 3 and 0 2 minus 0 4 
should be zero and the threshold values for 0 2 minus 6, , 
9 2 minus 0 3 , minus 9 4 , and 0 4 minus 0 3 should be 90°. 
The differences between the attributes of rectangularity 20 
and threshold values are then evaluated using prede- 
termined criteria (Step 34). In this particular embodi- 
ment, the predetermined criteria is set to accept a dif- 
ference of up to 2° between the attributes of rectangu- 
larity and the threshold values. If the difference is less 25 
than 2°, then the evaluation results signal that the at- 
tribute of rectangularity is acceptable. For example, if 
minus G 3 was 1 0 then the evaluation results would signal 
that the attribute of rectangularity is acceptable, but if 0 1 
minus 6 3 was 3° then the evaluation results would signal 30 
that the attribute of rectangularity is unacceptable. As 
discussed earlier, the particular criteria used can vary 
as needed and desired. Examples of criteria used in this 
and other examples set forth in FIGS. 3-1 0 for each at- 
tribute are intended to be illustrative and not exhaustive. 35 

. Referring to FIG. 3(a), a flow chart illustrates the 
process for obtaining, comparing, and evaluating the at- 
tribute of linearity. First, when examining the attribute of 
linearity, at least one edge of the document image is lo- 
cated in the frame of information (Step 40). Techniques 40 
for locating one edge of a document image in a frame 
of information are well known and thus will not be de- 
scribed here. Next, three sample points are placed 
along the located edge (Step 42). In FIG. 3(b), a frame 
of information 44 with a document image 46 against a <s 
fixed background is illustrated. Document image 46 has 
three sample points located along one edge 48 which 
are assigned labels A, B, and C. In FIG. 3(c), a second 
frame of information 50 with a document image 52 with 
a tear 54 is illustrated. Document image 52 has three so 
sample points located along one edge which are as- 
signed labels A_, B_ and C_. Although sample points 
A, B, and C and A_, B_ and C_ are only located along 
one edge 48 and 56 in this example, the sample points 
could be located and the process performed along each 55 
edge of document images 46 and 52. Additionally, more 
than three sample points could be used if desired. Next, 
the sample points on each document image 46 and 52 



are located (Step 58) and lines are drawn between each 
combination of two points (Step 60). Accordingly, in FIG. 
3(b) a line is drawn between points A and B to form line 
AB, between points A and C to form line AC, and be- 
tween B and C to form line BC. In FIG. 3(c) a line is 
drawn between points A_ and B_ to form line A_B_, be- 
tween points A_ and C_toform line A_C_, and between 
points B_ and C_ to form line B_C_. Next, the angles of 
each line AB, AC, BC, A-B-, A-C-, and B-C- with respect 
to a coordinate system based on the frames of informa- 
tion 44 and 50 are determined (Step 62) and then the 
angles for each line in each frame of information 44 and 
50 are compared to determine if they are equal (Step 
64). The process used for determining the angles for 
each line is the same as that discussed earlier with re- 
spect to FIG. 2(b) and thus is not repeated again here. 
In FIG. 3(b), the angles for each of line AB, AC, and BC 
are equal, while in FIG. 3(c), the angles for each line 
A'B\ A'C, and B'C 1 are not all equal because of the tear 
54. The difference between the angles for each set of 
two lines in each frame of information 44 and 50 is av- 
eraged and this average value is the attribute of linearity 
for the document images 46 and 52. 

Once the attribute of linearity is obtained (Step 30), 
a threshold value for the attribute is selected from a da- 
tabase in RAM 31 or - ROM 33 (Step 32). Once the 
threshold value is selected, the difference between the 
attribute of linearity for each frame of information 44 and 
50 and the threshold value is determined and then a pre- 
determined set of criteria is used to evaluate each dif- 
ference and to provide evaluation results (Step 34). In 
this particular embodiment, the threshold value Is 0° and 
the predetermined criteria is set to allow up to 2° differ- 
ence between the attribute of linearity and the threshold 
value. If the difference is less than 2 P , then the evalua- 
tion results signal that the attribute of linearity is accept- 
able. 

Referring to FIG. 4(a), a flow chart illustrates the 
process for obtaining, comparing, and evaluating the at- 
tribute of corner location. First, when examining the at- 
tribute of corner location, the edges A, B, C, and D of a 
document image 78 in a frame of information 80 are de- 
tected (Step 82), as shown in FIG. 4(b). Once each edge 
A, B, C, and D of document image 78 is identified, the 
coordinates for the expected corners EC 1 ,EC 2 , EC 3 , 
and EC 4 for document image 78 are calculated (Step 
84). The document which document image 78 in FIG. 4 
(b) represents has an upper left-hand corner 86 which 
was bent when scanned by scanner assembly 22. The 
dotted lines illustrate where the comer is expected to 
be. Next, the coordinates for actual corners AC, , AC 2 , 
AC 3 , and AC 4 for document image 78 are detected by 
looking for the first light to dark transition and then dark 
to light transition in each row of pixels in document im- 
age 78 (Step 88). Finally, the distance between the co- 
ordinates for actual and expected corners is determined 
and the distance is the attribute of corner location for 
each comer of document image 78 (Step 90). 
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other line (Step 152). Collinearity is then checked again 
(Step 148). If the probes are collinear, then the angle of 
the line formed by the probes is determined and is taken 
as the attributes of line skew, (Step 150) as shown in 
FIG. 7(d). If collinearity is not found, then the process 5 
continues (Step 1 52) until the line is found, or all probes 
128, 130, 132, and 134 are moved past a preset point 
in the document. In this particular embodiment, the pre- 
set point is considered to be halfway down document 
image 136. 10 

Once the attribute of line skew is obtained (Step 30), 
a threshold value for the attribute is selected from a da- 
tabase in RAM 31 or ROM 33 (Step 32). In this particular 
embodiment, the threshold value attribute of sharpness 
is 0 degrees. Once the threshold value is selected, the is 
difference between the attribute of line skew and the 
threshold value is determined and then the difference is 
evaluated using predetermined criteria to provide eval- 
uation results (Step 34). In this particular embodiment, 
the predetermined criteria will allow a line skew of up to 20 
1/2 a degree. If the difference between the attribute of 
line skew and the threshold value is less than half a de- 
gree, then the evaluation results signal that the attribute 
of line skew is acceptable. If desired, system 20 can be 
programmed to correct the document image for any line 25 
skew. 

Referring to FIG. 8, a flow chart illustrates the proc- 
ess for obtaining, comparing, and evaluating the at- 
tribute of average character confidence for the entire 
document image. First, characters within the document 30 
image are located and processed to separate the char- 
acters from each other (Step 154). Next, optical charac- 
ter recognition (OCR) is performed on each character 
to provide an average character confidence for each 
character (Step 1 56). In this step, system 20 and meth- 35 
od identify each different type of character in document 
image and then run comparisons of each different char- 
acter against stored characters. A percentage likelihood 
of each of the identified characters matching a stored 
character is determined and then the average of those 40 
percentages for each character is the average character 
confidence for each character. System 20 and method 
then averages together the average character confi- 
dence for all of the characters in the document image to 
provide the attribute of average character confidence for 45 
the entire document image (Step 158). The attribute of 
average character confidence provides an indication of 
the condition of overall character confidence in the doc- 
ument image and thus an indication of the quality of the 
document image itself. so 

Once the attribute of average character confidence 
is obtained (Step 30), a threshold value for the attribute 
is selected from a database in RAM 31 or ROM 33 (Step 
32). In this particular embodiment, the threshold value 
for average character confidence is 100%. Once the ss 
threshold value is selected, the difference between the 
attribute of average character confidence and the 
threshold value is determined and the difference is eval- 



uated using predetermined criteria to provide evaluation 
results (Step 34). In this particular embodiment, the pre- 
determined criteria will accept a difference of up to 1.5 
standard deviations away from the threshold value. If 
the difference is within 1.5 standard deviations, then the 
evaluation results signal that the attribute of average 
character confidence is acceptable. 

Referring to FIG, 9, a flowchart illustrates the proc- 
ess for obtaining, comparing, and evaluating the at- 
tribute of expected contrast. First, the number of black 
and white pixels in each row of the document image are 
counted (Step 94). Next, the ratio of black to white pixels 
for the entire document image is calculated to obtain the 
attribute of expected contrast (Step 96). Form docu- 
ments, such as tax forms, will have the same ratio of 
black to white pixels in the document images each time 
the documents are scanned. 

Once the attribute of expected contrast is obtained 
(Step 30), a threshold value for the attribute is selected 
from a database in RAM 31 or ROM 33 (Step 32). In this 
particular embodiment, the threshold value is a mean 
ratio of black to white pixels obtained from scanning a 
number of the same type of documents. Once the 
threshold value is selected, the difference between the 
attribute of expected contrast and the threshold value is 
determined and then the difference is evaluated using 
predetermined criteria to provide evaluation results 
(Step 34). In this particular embodiment, the predeter- 
mined criteria will accept a difference of up to one stand- 
ard deviation above or below the threshold value. If the 
difference is within one standard deviation, then the 
evaluation results signal that the attribute of expected 
contrast is acceptable. 

Referring to FIG. 10, a flow chart illustrates the proc- 
ess for obtaining, comparing, and evaluating the at- 
tribute of sharpness. The attribute of sharpness pro- 
vides an indication of how blurred the document image 
may be and whether the document image is suitable for 
further processing. First, the document image is located 
in the frame of information (Step 108). Once the docu- 
ment image is located, then the frequency of black to 
white pixels per line of the document image is obtained 
and is used as the attributes of sharpness (Step 110). 
Each type of form document, such as a tax form, will 
have a unique frequency of black to white pixels. 

Once the attribute of sharpness is obtained (Step 
30), a threshold value for the attribute is selected from 
a database in RAM 31 or ROM 33 (Step 32). In this par- 
ticular embodiment, the threshold value is a mean fre- 
quency obtained from scanning a number of the same 
type of documents. Once the threshold value is select- 
ed, the difference between the attribute of sharpness is 
determined and the difference is evaluated using pre- 
determined criteria to provide evaluation results (Step 
34). In this particular embodiment, the predetermined 
criteria will accept a difference of up to 10% from the 
threshold value. If the difference is within 10%, then the 
evaluation results signal that the attribute of sharpness 
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locating the sample points; 

analyzing sets of three points in each section 

for collinearity; and 

determining the angle of a line connecting the 
three most collinear points to obtain an attribute 
of sheet skew. 

10. The method as set forth in Claim 1 further compris- 
ing the steps co- 
processing the document image to obtain at- 
tributes related to line skew and average char- 
acter confidence in the document image; 
selecting a threshold value from a database for 
each of the obtained attributes; and 
comparing each of the obtained attributes 
against the threshold value selected for the ob- 
tained attribute to determine the difference for 
each and then evaluating one or more of the 
differences using predetermined criteria to pro- 
vide evaluation results of the condition of the 
text ol the document image. 



the document image. 

13. The method as set forth in Claim 1 further compris- 
ing the steps of: 

5 

processing the document image to obtain at- 
tributes related to expected contrast and sharp- 
ness; 

selecting a threshold value for each of the ob- 

io tained attributes; and 

comparing each of the obtained attributes 
against the threshold value selected for the ob- 
tained attribute to determine a difference for 
each and then evaluating one or more of the 

* 5 differences using predetermined criteria to pro- 

vide evaluation results of the condition of the 
document image with respect to a fixed refer- 
ence. 

20 14. The method as set forth in Claim 13 wherein the 
step of processing the document image to obtain 
the attribute of expected contrast comprises: 



11. The method as set forth in Claim 10 wherein the 
step of processing the document image to obtain 
the attribute of line skew comprises: 

detecting the location of the document image; 
generating a bounding box around the docu- 
ment image; 

locating at least three probes along the top 
edge of the bounding box; 
moving each of the probes down one row of pix- 
els at a time in the document image until each 
of the probes detects a line; 
determining if each of the three probes is col- 
linear; 

moving the probe which has moved the least to 
the next line if the probes are not collinear; and 
determining the angle of the line when the 
probes are collinear to obtain the attribute of 
line skew. 

12. The method as set forth in Claim 10 wherein the 
step of processing the document image to obtain 
the attribute of average character confidence com- 
prises: 

detecting the location of the document image; 
locating and processing each character in the 
document image; 

performing optical character recognition on 
each located character and obtaining an aver- 
age character confidence for each character; 
and 

averaging all of the average character confi- 
dence for each character obtained to obtain the 
attribute of average character confidence for 



detecting the location of the document image; 
25 and 

counting the number of black and white pixels 
in the document image to obtain the attribute of 
expected contrast. 

30 15. The method as set forth in Claim 13 wherein the 
step of processing the document image to obtain 
the attribute of sharpness comprises: 

detecting the location of the document image; 
35 and 

determining the frequency of black to white pix- 
els per line of the document image to obtain the 
attributes of sharpness. 

40 16. A system for assessing a document image, the sys- 
tem comprising: 

means for processing one or more of the doc- 
ument images to obtain one or more attributes 
45 related to the geometrical integrity of each of 

the document images; 

means for selecting a threshold value for each 
of the attributes from a memory; and 
means for comparing each of the obtained at- 
50 tributes against the threshold value selected for 

the obtained attribute to determine the differ- 
ence for each and the evaluating one or more 
differences to provide evaluation results on the 
geometrical integrity of the document image. 

55 

17. The system as set forth in Claim 1 6 further comprise 
ing a scanner assembly for scanning documents to 
obtain the document images, the scanner assembly 
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